The Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017
نویسندگان
چکیده
We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n-grams. Our systems achieved the highest weighted accuracy across all the test datasets, occupying the first four places in the ranking.
منابع مشابه
Overview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian
Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...
متن کاملCross-genre Gender Identification in Russian Texts Using Topic Modeling Working Note: Team DUBL
In this paper, we describe the results of gender identification from Team DUBL. We used a topic modeling approach for identifying the author’s gender based on his/her written texts. The model was trained on the RusProfiling PAN 2017 Twitter Corpus that contains data in the Russian language. Themodel has been evaluated on texts of other genres, including texts such as letters to a friend, online...
متن کاملRepresentation of Target Classes for Text Classification - AMRITA_CEN_NLP@RusProfiling PAN 2017
This working note describes the system we used while participating in RusProfiling PAN 2017 shared task. The objective of the task is to identify the gender trait of the author from the author’s text written in the Russian Language. Taking this as a binary text classification problem, we have experimented to develop a representation scheme for target classes (called class vectors) from the text...
متن کاملAmritaNLP@PAN-RusProfiling : Author Profiling using Machine Learning Techniques
This paper illustrates work done on "Gender Identi cation in Russian texts (RusPro ling)" shared task, hosted by PAN in conjunction with FIRE 2017. The task is to predict the author’s gender, based on the Twitter data corpus which is in Russian. We will give a brief introduction to the task at hand, elaborate on the data-set provided by the competition organizers, discuss various feature select...
متن کاملGender Identification in Russian Texts
Gender Identification is a task where we have to identify the gender of the author for written texts. An hybrid approach has been designed by combining deep neural network and a rule-based classifier for russian texts. LSTM and BiLSTM have been used as a part of Neural Network due to their capability to learn long-term dependencies.
متن کامل